Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
نویسندگان
چکیده
In this paper, we present a solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet. Adopting a classical Bayesian perspective, we performs posterior inference over hundreds of languages, leveraging knowledge of known languages and alphabets to uncover general linguistic patterns of typologically coherent language clusters. We achieve average accuracy in the unsupervised consonant/vowel prediction task of 99% across 503 languages. We further show that our methodology can be used to predict more fine-grained phonetic distinctions. On a three-way classification task between vowels, nasals, and nonnasal consonants, our model yields unsupervised accuracy of 89% across the same set of languages.
منابع مشابه
پیشبینی قابلیت فهم همخوانها در افراد دارای شنوایی عادی با استفاده از مدلهای میکروسکوپی دارای معیار فاصله مختلف در بازشناساگر خودکار گفتار
In this study, recognition rates of consonants available in vowel-consonant-vowel structure in hearing tests and two microscopic models will be investigated. Such a syllable structure doesn’t exist in Farsi and Azerbaijani languages, but since the goal is only recognition of middle phoneme, according to hearing tests, listeners are able to properly recognize phonemes in clean speech conditions....
متن کاملAssimilation of Final Low Back Vowel in Eghlidian Dialect
In this article, the low back vowel /A/ in word-final positions in Eghlidian dialect, one of Persian dialects, is studied. This vowel is represented phonetically as [A], [o] and [@] in different phonetic environments. Therefore many words were collected via interviewing ten native speakers so that these different alternant forms can be accounted for appropriately. Since one of the authors of th...
متن کاملStudy on the Anticipatory Coariticulatory Effect of Chinese Disyllabic Sequences
In this study, the Vowel-to-Vowel (V-to-V) coarticulatory effect in the Vowel-Consonant-Vowel (VCV) sequences is investigated, and the F2 offset value of the first vowel is analyzed. Results show that, in the trans-segment context, anticipatory coarticulation exists in Chinese. Due to high articulatory strength of aspirated obstruents, in the context of subsequent vowel /i/, the V1 F2 offset va...
متن کاملRecognition of Tamil Syllables Using Vowel Onset Points with Production, Perception Based Features
Tamil Language is one of the ancient Dravidian languages spoken in south India. Most of the Indian languages are syllabic in nature and syllables are in the form of Consonant-Vowel (CV) units. In Tamil language, CV pattern occurs in the beginning, middle and end of a word. In this work, CV Units formed with Stop Consonant – Short Vowel (SCSV) were considered for classification task. The work ca...
متن کاملIssues of phonological complexity: Statistical analysis of the relationship between syllable structures, segment inventories and tone contrasts
It is often suggested that languages are likely to ‘compensate’ complexity in one subsystem by simplicity elsewhere. In this paper evidence against this idea is presented by examining several subsystems of the basic phonology in a set of over 600 languages selected to represent genetic and areal diversity. The relationships between elaboration of the syllable canon, the size of segment inventor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013